Eagle2-9B is the latest Vision-Language Model (VLM) released by NVIDIA, achieving a perfect balance between performance and inference speed. It is built on the Qwen2.5-7B-Instruct language model and the Siglip+ConvNext vision model, supporting multilingual and multimodal tasks.
Image-to-Text
Transformers Other